AITopics | compositional transformer

Compositional Transformers for Scene Generation Supplementary Material

Neural Information Processing SystemsApr-25-2026, 20:32:14 GMT

Figure 10: A visualization of the layouts and unsupervised depth maps produced by GANformer2's planning stage while synthesizing varied images, making the generative process more structured and interpretable. GANformer2 creates the layout sequentially, segment-by-segment, to capture the scene's compositionality, effectively allowing us to add or remove objects from the resulting images. Since GANformer2 creates each scene as a composition of interacting segments, it supports adding and removal of objects while respecting various dependencies with their surroundings: Amodal completion of occluded objects is denoted by pink, updates of shadows and especially reflections by cyan, and other object removals cases by yellow. Shape manipulation is denoted by green, while position changes by yellow. Color manipulation is denoted by pink, while updates of material by cyan.

artificial intelligence, layout, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Compositional Transformers for Scene Generation

Neural Information Processing SystemsDec-24-2025, 02:43:37 GMT

We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes.

compositional transformer, name change, scene generation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Compositional Transformers for Scene Generation

Neural Information Processing SystemsOct-10-2024, 07:59:24 GMT

We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency.

compositional transformer, layout, scene generation

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Filters

Collaborating Authors

compositional transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Compositional Transformers for Scene Generation Supplementary Material

Compositional Transformers for Scene Generation

Compositional Transformers for Scene Generation